Computer vision - Lab 5¶
Agenda¶
- frequency domain image,
- image analysis in the frequency domain,
- high-pass and low-pass filtering,
- optimization of the filtering process by convolution operation using the frequency domain
Helpers¶
To perform the tasks, it is necessary to import the libraries used in the script and download the data on which we will be working.
In this script we will be using:
- Image Lenna (available at the link) - one of the most popular images historically used for testing image processing and compression,
# importing of libraires that will be use in the script
import cv2
import matplotlib.pyplot as plt
import numpy as np
import PIL
%matplotlib inline
from pandas import DataFrame
import pandas as pd
from IPython.display import display, HTML
from skimage.exposure import rescale_intensity
import plotly.graph_objects as go
import pandas as pd
import json
import os
pd.options.display.html.border = 0
pd.options.display.float_format = '{:,.2f}'.format
# download Lenna image
!wget -O lena_std.tif http://www.lenna.org/lena_std.tif
--2024-11-13 20:23:06-- http://www.lenna.org/lena_std.tif Resolving www.lenna.org (www.lenna.org)... 107.180.37.106 Connecting to www.lenna.org (www.lenna.org)|107.180.37.106|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 786572 (768K) [image/tiff] Saving to: ‘lena_std.tif’ lena_std.tif 100%[===================>] 768.14K 1.03MB/s in 0.7s 2024-11-13 20:23:07 (1.03 MB/s) - ‘lena_std.tif’ saved [786572/786572]
The colab platform requires a special way to display images with opencv. If the notebook is run in collab, execute the following code:
if "google.colab" in str(get_ipython()):
from google.colab.patches import cv2_imshow
imshow = cv2_imshow
else:
def imshow(img):
img = img.clip(0, 255).astype("uint8")
if img.ndim == 3:
if img.shape[2] == 4:
img = cv2.cvtColor(img, cv2.COLOR_BGRA2RGBA)
else:
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
display(PIL.Image.fromarray(img))
def h_color(a, interpolation=None, size=None, fy=1.5, fx=1.5, cmap="gray"):
s = [int(a.shape[0] * fy), int(a.shape[1] * fx)] if size is None else size
plt.figure(figsize=s)
plt.tick_params(
axis="both",
which="both",
bottom=False,
top=False,
labelbottom=False,
labelleft=False,
left=False,
right=False,
)
plt.imshow(a, cmap=cmap, interpolation=interpolation)
css = """
<style type="text/css">
table, td, table.dataframe, table.dataframe td {
border: 1px solid black; //border: double;
border-collapse: collapse;
border-style: solid;
border-spacing: 0px;
background-color: rgb(250,250,250);
width: 24px;
height: 24px;
text-align: center;
transform: scale(1.0);
margin: 5px;
}
</style>
"""
def h(s):
return display(HTML(css + DataFrame(s).to_html(header=False, index=False)))
def h_color_3d(z):
fig = go.Figure(data=[go.Surface(z=z)])
fig.update_layout(autosize=False, width=500, height=500)
fig.show()
Image in the frequency domain¶
Previous examples of image processing dealt with the domain of intensity. Another area where an image can be represented is frequency. The frequency domain image should be viewed as information about which pixels contain some frequently repeating pattern and which are little repeating patterns.
For example, an image that is very blurry will have a lot of repeating fragments (blur is a kind of repeating function). However, a very sharp image will have much less such patterns (because neighboring pixels will be very different from each other).
An example of the transition from the intensity domain to the frequency domain is shown below.
img = cv2.imread("./lena_std.tif", 1)
img_grayscale = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
imshow(img)
# imshow(img_grayscale)
Fourier transform¶
The Fourier transform is given by the following formula:
$${\mathcal{F}}(\omega) = \int_{-\infty}^{\infty} f(x) e^{-2\pi i x \omega} dx = \int_{-\infty}^{\infty} f(x) (\cos(-2\pi i x \omega) + i \sin(-2\pi i x \omega)) dx $$
for 2d signal:
$${\mathcal{F}}(u,v) = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f(x,y) e^{-2\pi i (ux+vy)} dx dy$$
Where $u,v$ are frequencies, $ f (x,y) $ is an image on an intensity scale, $ i ^ 2 = -1 $ - an imaginary number. The above formula can be interpreted as follows:
- ${\mathcal{F}}(u,v) $ it is performed for each frequency $u$ and $v$,
- for each $ {\mathcal {F}} (u,v) $, the input image is multiplied by the component given in the formula (with the corresponding $ u,v $) and the resultant summed. It can also be treated as a weighted sum of each pixel, where the weight is the component given in the formula,
The transformation from the intensity domain to the frequency domain is performed using the Fourier transform, which is implemented in OpenCV or NumPy. An example of using the NumPy library is presented below.
The fft2 function performs a Fourier Transform for 2-dimensional signals. Then, using the fftshift function, we shift the data in such a way that the pixels corresponding to the low frequencies are closer to the center and the high frequency pixels at the edges of the image (in the frequency domain).
f = np.fft.fft2(img_grayscale)
fshift = np.fft.fftshift(f)
magnitude_spectrum = 20 * np.log(np.abs(fshift)) # in dB
The inverse transform comes down to performing an inverse pixel shift and performing the operation ifft2, which results in an image in the intensity domain (it is necessary to take the real part by real(), because the result is in a complex form).
f_ishift = np.fft.ifftshift(fshift)
img_back = np.fft.ifft2(f_ishift)
img_back = np.real(img_back)
imshow(np.concatenate([img_grayscale, magnitude_spectrum, img_back], 1))
Interpretation of the frequency domain¶
Having the inverse transform (frequency -> intensity), we can do a few experiments to approximate the relationship between pixel values in the frequency domain and the intensity.
def get_mask(s, div):
mask = np.zeros(s, np.float32)
return cv2.circle(mask, (s[0] // 2, s[1] // 2), s[0] // div, 1, -1)
def fft(img, size=None):
f = np.fft.fft2(img, size)
fshift = np.fft.fftshift(f)
spectrum = 20 * np.log(np.abs(fshift))
return fshift, spectrum
def ifft(fshift):
f_ishift = np.fft.ifftshift(fshift)
img_back = np.fft.ifft2(f_ishift)
return np.real(img_back)
def showFreqAndImages(frequencies, images):
fig, axes = plt.subplots(2, len(frequencies), figsize=(20, 10))
for i, (freq, img) in enumerate(zip(frequencies, images)):
axes[0, i].imshow(freq, cmap="gray", vmin=0, vmax=255)
axes[1, i].imshow(img, cmap="gray", vmin=0, vmax=255)
plt.show()
Let's define an image (256 x 256) filled with all zeros with high values set at certain positions. These coordinates will be respectively (where $S_h = height / 2 = 128$ and $S_w = width / 2 = 128$):
- $(S_h, S_w)$
- $(S_h, S_w - 1)$
- $(S_h, S_w + 1)$
- $(S_h, S_w - 1)$, $(S_h, S_w + 1)$
sw = 256 // 2
sh = 256 // 2
freq_1 = np.zeros([256, 256], np.float32)
freq_1[sh, sw] = 10000000.0
freq_2 = np.zeros([256, 256], np.float32)
freq_2[sh, sw - 1] = 10000000.0
freq_3 = np.zeros([256, 256], np.float32)
freq_3[sh, sw + 1] = 10000000.0
freq_4 = np.zeros([256, 256], np.float32)
freq_4[sh, sw - 1] = 10000000.0
freq_4[sh, sw + 1] = 10000000.0
img_i_1 = ifft(freq_1)
img_i_2 = ifft(freq_2)
img_i_3 = ifft(freq_3)
img_i_4 = ifft(freq_4)
showFreqAndImages(
[freq_1, freq_2, freq_3, freq_4], [img_i_1, img_i_2, img_i_3, img_i_4]
)
Similarly, for changing pixel values vertically:
- $(S_h, S_w)$
- $(S_h - 1, S_w)$
- $(S_h + 1, S_w)$
- $(S_h - 1, S_w)$, $(S_h + 1, S_w)$
sw = 256 // 2
sh = 256 // 2
freq_1 = np.zeros([256, 256], np.float32)
freq_1[sh, sw] = 10000000.0
freq_2 = np.zeros([256, 256], np.float32)
freq_2[sh - 1, sw] = 10000000.0
freq_3 = np.zeros([256, 256], np.float32)
freq_3[sh + 1, sw] = 10000000.0
freq_4 = np.zeros([256, 256], np.float32)
freq_4[sh - 1, sw] = 10000000.0
freq_4[sh + 1, sw] = 10000000.0
img_i_1 = ifft(freq_1)
img_i_2 = ifft(freq_2)
img_i_3 = ifft(freq_3)
img_i_4 = ifft(freq_4)
showFreqAndImages(
[freq_1, freq_2, freq_3, freq_4], [img_i_1, img_i_2, img_i_3, img_i_4]
)
The above experiment provides the following conclusions:
- the middle pixel corresponds to a uniform image (frequency equal to zero),
- pixels deflected by 1 to the left and right represent the same results because they correspond to signals with opposite frequencies),
- the horizontal changes correspond to the 2D horizontal signals in the image in the intensity domain; similarly for vertical changes.
By setting high values further from the center, the inverse Fourier transform operation will create intensity-domain images containing signals of higher frequency.
sw = 256 // 2
sh = 256 // 2
freq_1 = np.zeros([256, 256], np.float32)
freq_1[sh - 1, sw] = 10000000.0
freq_1[sh + 1, sw] = 10000000.0
freq_1[sh, sw - 1] = 10000000.0
freq_1[sh, sw + 1] = 10000000.0
freq_2 = np.zeros([256, 256], np.float32)
freq_2[sh - 1, sw - 1] = 10000000.0
freq_2[sh + 1, sw + 1] = 10000000.0
freq_2[sh - 1, sw + 1] = 10000000.0
freq_2[sh + 1, sw - 1] = 10000000.0
freq_3 = np.zeros([256, 256], np.float32)
freq_3[sh, sw - 2] = 10000000.0
freq_4 = np.zeros([256, 256], np.float32)
freq_4[sh, sw - 8] = 10000000.0
img_i_1 = ifft(freq_1)
img_i_2 = ifft(freq_2)
img_i_3 = ifft(freq_3)
img_i_4 = ifft(freq_4)
imshow(np.concatenate([freq_1, freq_2, freq_3, freq_4], 1))
imshow(np.concatenate([img_i_1, img_i_2, img_i_3, img_i_4], 1))
Operations in the frequency domain¶
Image processing in the frequency domain is closely related to the concept of intensity convolution. According to the theorem resulting from the Fourier transform (link) we can write:
$$(f \bullet g)(t) = {\mathcal{F}}^{-1}\{F * G\}$$
where:
- $f$, $F$ - input image in the intensity and frequency domains, respectively,
- $g$, $G$ - kernel / filter (e.g. Laplasjan) in intensity and frequency domains, respectively,
- ${\mathcal{F}^{-1}}$ - inverse Fourier transform
- $\bullet$ - convolution
- $*$ - multiplication
The above theory says that the operation of the convolution of two functions in the domain of time or space is equivalent to the operation of multiplication (element-wise) of their representation in the frequency space.
Note: In the case of a Discrete Fourier Transform (DFT), convolution creates a circular convolution, where the signal wraps around at the edges instead of extending linearly. This is generally not the desired outcome for linear convolution, as it leads to aliasing and incorrect results. To avoid this, padding is added to the FFT so that the size is $N+M-1$, where N is the size of the signal and M is the size of the kernel.
Computational complexity of convolution
For two sequences of lengths N and M , direct convolution requires each element of one sequence to be multiplied with every element of the other sequence, and then summed up for each shift. This gives a time complexity of: $O(NM)$
For FFT-based convolution with padding $L = N+M-1$ the complexity is $O(L$ $log$ $L)$
Earlier transformations between the domains showed that an image of a certain size (in the intensity domain) would have the same size in the frequency domain. The size directly affects the number of maximum frequencies, hence, in order to present an image of a smaller size in a greater number of frequencies, the image can be completed (as in the case of convolution) with only zeros.
When using the OpenCV implementation, you need to do it manually, while when implementing NumPy, you just need to add the size information as the second parameter of the fft2 function.
mean_filter = np.ones((3, 3))
x = cv2.getGaussianKernel(5, 10)
gaussian = x * x.T
laplacian = np.array([[0, 1, 0], [1, -4, 1], [0, 1, 0]])
filters = [mean_filter, gaussian, laplacian]
fft_filters = [np.fft.fft2(x, (256, 256)) for x in filters]
fft_shift = [np.fft.fftshift(y) for y in fft_filters]
mag_spectrum = [np.log(np.abs(z) + 1) for z in fft_shift]
imshow(np.concatenate(mag_spectrum, 1) * 255)
Having images in the same domain defined by the same number of frequencies, according to the theory cited, we can perform the equivalent of the convolution operation using simple multiplication in the frequency domain.
laplacian = np.array([[0, 1, 0], [1, -4, 1], [0, 1, 0]])
f_lap_shift, f_lap_mag = fft(laplacian, (512, 512))
fshift, spectrum = fft(img_grayscale)
img_i = np.real(ifft(fshift * f_lap_shift))
imshow(np.concatenate([img_grayscale, np.real(spectrum * f_lap_shift), img_i], 1))
img_i = cv2.filter2D(img_grayscale, -1, laplacian)
imshow(np.concatenate([img_grayscale, img_i], 1))
<ipython-input-10-2b9d1640f6b2>:9: RuntimeWarning: divide by zero encountered in log spectrum = 20 * np.log(np.abs(fshift))
Low Pass Filter¶
Frequency domain filtering can also be divided into low pass and high pass.
Low-pass filtering is one that passes low-frequency signals through and cuts the signals above a certain threshold. As you know, high-frequency signals are at the edges of the image, and the closer to the center, the lower the frequency of the signals.
The low-pass filtering implemented as the cut-off point is shown below.
m1 = get_mask(img_grayscale.shape, 2)
m2 = get_mask(img_grayscale.shape, 8)
m3 = get_mask(img_grayscale.shape, 16)
fshift, spectrum = fft(img_grayscale)
img_back_1 = ifft(fshift * m1)
img_back_2 = ifft(fshift * m2)
img_back_3 = ifft(fshift * m3)
imshow(np.concatenate([img_grayscale, spectrum * m1, img_back_1], 1))
imshow(np.concatenate([img_grayscale, spectrum * m2, img_back_2], 1))
imshow(np.concatenate([img_grayscale, spectrum * m3, img_back_3], 1))
The result of low-pass filters should usually be a blur effect due to the fact that we get rid of areas of high variability (such as edges, details, but also noise).
High Pass Filter¶
The filters from the high-pass family work in a similar way. By passing signals of only high frequency, we extract places with edges, sharp image or simply noise.
A high-pass filter implemented as cutting off signals below a certain threshold is presented below.
m1 = 1 - get_mask(img_grayscale.shape, 2)
m2 = 1 - get_mask(img_grayscale.shape, 16)
m3 = 1 - get_mask(img_grayscale.shape, 32)
fshift, spectrum = fft(img_grayscale)
img_back_1 = ifft(fshift * m1)
img_back_2 = ifft(fshift * m2)
img_back_3 = ifft(fshift * m3)
imshow(np.concatenate([img_grayscale, spectrum * m1, img_back_1], 1))
imshow(np.concatenate([img_grayscale, spectrum * m2, img_back_2], 1))
imshow(np.concatenate([img_grayscale, spectrum * m3, img_back_3], 1))
print(np.min(m1))
print(spectrum)
0.0 [[136.74665629 151.51819936 117.92560389 ... 132.37976908 117.92560389 151.51819936] [136.60731322 128.97338507 125.94892297 ... 133.81906998 141.28344555 145.63716725] [132.74891631 128.02355302 138.03274675 ... 144.21162894 133.8756532 128.42548793] ... [133.52381331 145.79876858 138.13056323 ... 131.54328338 127.05250671 113.48637693] [132.74891631 128.42548793 133.8756532 ... 116.78171714 138.03274675 128.02355302] [136.60731322 145.63716725 141.28344555 ... 147.82437841 125.94892297 128.97338507]]
Task 1¶
Check and argue which of the filters listed below is low pass and which is high pass.
Filters:
- averaging,
- gaussian,
- sobel,
- laplasjan
Averaging Filter¶
n = 11
avg = np.array([1/(n*n) for i in range(n * n)]).reshape((n, n))
img_i = cv2.filter2D(img_grayscale, -1, avg) #no need to invert filter because it is symmetric
fft_avg_filter = np.fft.fft2(avg, (512, 512))
fft_shifted = np.fft.fftshift(fft_avg_filter)
spectrum = np.log(np.abs(fft_shifted) + 1)
imshow(np.concatenate([img_grayscale, img_i, spectrum * 255], 1))
Conclusion: Averaging Filter is low-pass filter, because it blures the image. In other words it allows low frequencies - those close to the center (white circle visible in the filter)
Gaussian Filter¶
n = 21
sig = 3.0
center = (n - 1)/2
gaus = np.fromfunction(lambda x, y: (1 / (2 * np.pi * sig**2)) * np.exp(-((x - center)**2 + (y - center)**2) / (2 * sig**2)), (n, n))
gaus /= np.sum(gaus)
fft_gaus_filter = np.fft.fft2(gaus, (512, 512))
fft_shifted = np.fft.fftshift(fft_gaus_filter)
spectrum = np.log(np.abs(fft_shifted) + 1)
img_i = cv2.filter2D(img_grayscale, -1, gaus) #no need to invert filter because it is symmetric
imshow(np.concatenate([img_grayscale, img_i, spectrum * 255], 1))
Conclusion: Gaussian Filter is low-pass filter, because it blures the image. In other words it allows low frequencies - those close to the center (white circle visible in the filter)
Sobel Filter¶
sobel_x = cv2.Sobel(img_grayscale, cv2.CV_64F, 1, 0, ksize=3) # Horizontal edges
sobel_y = cv2.Sobel(img_grayscale, cv2.CV_64F, 0, 1, ksize=3) # Vertical edges
sobel_combined = cv2.magnitude(sobel_x, sobel_y)
sobel_x = np.array([[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]])
sobel_y = np.array([[-1, -2, -1], [0, 0, 0], [1, 2, 1]])
fft_sx_filter = np.fft.fft2(sobel_x, (512, 512))
fft_shifted = np.fft.fftshift(fft_sx_filter)
spectrum1 = np.log(np.abs(fft_shifted) + 1)
fft_sy_filter = np.fft.fft2(sobel_y, (512, 512))
fft_shifted = np.fft.fftshift(fft_sy_filter)
spectrum2 = np.log(np.abs(fft_shifted) + 1)
imshow(np.concatenate([img_grayscale, sobel_combined, spectrum1 * 255, spectrum2 * 255], 1))
Conclusion: Sobel filter is high-pass filter, because it preserves edges. In other words it allows high frequencies - values further from center and filters out low frequency values - those near the center (black stripe betweeen two elipitic white shapes visible in the filter)
Laplacian Filter¶
laplacian = np.array([[0, 1, 0], [1, -4, 1], [0, 1, 0]])
fft_lap_filter = np.fft.fft2(laplacian, (512, 512))
fft_shifted = np.fft.fftshift(fft_lap_filter)
spectrum = np.log(20 * np.abs(fft_shifted))
img_i = cv2.filter2D(img_grayscale, -1, laplacian) #no need to invert filter because it is symmetric
imshow(np.concatenate([img_grayscale, img_i, spectrum * 255], 1))
<ipython-input-177-5b91a595382a>:5: RuntimeWarning: divide by zero encountered in log spectrum = np.log(20 * np.abs(fft_shifted))
Conclusion: Laplacian filter is high-pass filter, because it preserves edges. In other words it allows high frequencies - those close to the edges (white plane surrounding big black dot visible in the filter)
Task 2¶
Perform edge detection (vertically and horizontally separately) using Sobel's frequency domain filters, then combine the detected features into one image and present the results. Compare the frequency domain performance with the intensity domain performance of the Sobel filter. Note: The resulting images from both methods should be the same.
sobel_x = cv2.Sobel(img_grayscale, cv2.CV_64F, 1, 0, ksize=3) # Horizontal edges
sobel_y = cv2.Sobel(img_grayscale, cv2.CV_64F, 0, 1, ksize=3) # Vertical edges
sobel_intensity_combined = cv2.magnitude(sobel_x, sobel_y)
imshow(np.concatenate([img_grayscale, sobel_x, sobel_y, sobel_intensity_combined], 1))
sobel_x_kernel = np.array([[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]])
sobel_y_kernel = np.array([[-1, -2, -1], [0, 0, 0], [1, 2, 1]])
fshift, spectrum = fft(np.pad(img_grayscale, pad_width=10, mode='reflect')) #edge padding or reflect padding;
#zero padding not suitable - risk of edge artifact
f_sob_shift_x, f_sob_mag_x = fft(sobel_x_kernel, (532, 532))
f_sob_shift_y, f_sob_mag_y = fft(sobel_y_kernel, (532, 532))
img_i_x = np.real(ifft(fshift * f_sob_shift_x))
img_i_y = np.real(ifft(fshift * f_sob_shift_y))
sobel_freq_combined = cv2.magnitude(img_i_x, img_i_y)
sobel_freq_combined = sobel_freq_combined[11:523, 11:523] #getting rid of padding
imshow(np.concatenate([img_grayscale, sobel_freq_combined], 1))
<ipython-input-10-2b9d1640f6b2>:9: RuntimeWarning: divide by zero encountered in log spectrum = 20 * np.log(np.abs(fshift))
It is almost the same picture - the only differences may come from numerical errors
imshow(np.concatenate([sobel_intensity_combined, sobel_freq_combined], 1))
eps = 1e-10
print(np.sum((sobel_freq_combined - sobel_intensity_combined) < eps) == 512 * 512) #Thus it is the same picture
True
print(np.sum(abs(sobel_freq_combined - sobel_intensity_combined))) #Sum of all differences can be ignored taking into account
#that there are over 200k values
2.082593628869205e-08